project Photon at Databricks
do a Technical Deep Dive
setting, what Vectorization is
work with a few benchmarks
you how to build a new engine
basic building blocks
a little about what motivated
trends in two dimensions
storage and networks
but CPUs not so much.
be the bottleneck on some
there's a natural question
next level of performance,
faster and they're loading
the kind of data modeling
is pristine and clean
whether the columns
data types so very often
process of one of the
and its full lifecycle from
and then converting it
silver and gold tables.
through this lifecycle
and finally producing
varying degrees of quality.
perfect schemas upfront
actually a case to be made
engine that is both good at
pristine schema and quality.
trends that motivate it.
where does Photon fit in?
have this Delta Engine
layers at the bottom
DataFrames, or Koalas.
to show you here is that
focused on that part.
these workloads we talked about
it's a specific technique
structured and semi-structured
query and decomposing it
that process vectors of data.
at doing this kind of work.
the pipelining of CPUs
Columnar in-memory formats,
it's kind of a batch-oriented
some Runtime Adaptivity
about actually later on
idea of compute kernels
that tends to be slow,
only that specific part.
ideas, they all sound great but
it sounds a little nebulous
build a mental model
build a vectorized engine
of this, you might ask,
completely from scratch
here on a variety of expressions
expressions, string expressions
still a lot to be gained.
Speedup that you can achieve
the range of 50 to 60 X.
Hash tables for aggregations.
simple query that can help
the vectorized engine is
pass batches of columnar data.
portions in this talk.
Evaluation focusing on this
c2 is smaller than 10.
into an expression tree
produce another input vector
the idea of compute kernels
compute kernels look like.
be implemented by code,
code you're seeing here
vector of output items.
over all the items,
them to the output.
what that means for us.
data input and output vectors,
that we've added in now
whether there's a NULL
operation if any of the,
adding in the NULLs makes
may sometimes have NULLs
there was a way for me to
my mostly non-NULL data
whether it has NULLs or not.
can use this property to,
kernel that can deal with NULLs
no NULLS on both sides.
this to only the left
expressions and other ideas
string and codings or min
data but the main idea is
allows you to adapt
you're seeing at runtime
on information that you have
inside of the schema.
because users don't need to
benefits because we do it
step, the inequality we can do
another optimization where,
we can kind of bake it in
opposed to passing a vector.
optimization allows us to get
way through and evaluated it
that we were processing?
evaluated as a new kind
an Active Row Spectrum.
a Lazy Representation
just indicating which
c1, c2, c3, g1 and g2
just tack on this active row
including the original data
to the next operator.
is there are several benefits
them is that you don't have to
would be kind of expensive.
helpful in more complicated
representation of filters?
a concept in the engine
throughout the engine
handy way of expressing
more complex operations like
kernels is, if we take a look
the rows that are active
to move on to aggregation.
grouping by g1 and g2
table that has these
can think of g1 and g2
it with the keys and
we've found the right bucket
evaluate the aggregation function
aggregation buffers.
pretty simple algorithm
implement in a row-wise system
is to think in a columnar
column-oriented and batch-oriented.
basically go through all
kernelize all the things
what that means exactly.
we see that there are,
to be achieved with this kind
even for a very simple query
nice Speedups so pay attention.
batch with g1 well only
they're shown g1, g2 and c3,
I've omitted the filter
stuff, because it just
take g1 and g2 and compute,
computing, the hashes
produce a vector of hashes.
entries into the hash table
the hashes, we will
that contains pointers
that we think might be a match
identified candidate buckets
buckets to identify collisions.
in a column by column way
the corresponding buckets
are pointing to buckets
about these collision cases
these non-matches is
linear probing strategy
and we'll repeat the step
identifying the non-matches
all of the bucket pointers
the c3 input vector
corresponding aggregation buffers
kernels might look like
Snippet for us to walk through.
engine and in columnar
dealing with hash tables
represent them as rows.
Mixed Column/Row Kernel.
values and a representation
whose values are sprayed
it's pretty straightforward
level of the indirection
aggregation function
input and the bucket
dealing with code like this
that might become large
by cache miss latencies.
that we've seen some of the
End-to-End Performance.
techniques we've just discussed
improvement in throughput
benchmark end-to-end.
I've shown you so far
to focus on read queries
chart is showing you
integer, so a single column
achieve a Speedup of,
the TPC-DS store sales table
who were interested
have a suitable workloads
end-to-end for suitable queries
that we'll be able to
bit about the basic blocks
because CPUs are very good
batch-level adaptivity
data you're seeing at runtime
about a Lazy filter
operations for hash tables.